Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script and snakemake rule to pull external data #190

Merged
merged 8 commits into from
Sep 10, 2024

Conversation

maartenbrinkerink
Copy link
Collaborator

@maartenbrinkerink maartenbrinkerink commented Sep 5, 2024

Description

-Added 'external_files.py' that pulls all external files based on retrievable url's.
-Added a rule to the preprocess snakemake file to run said script.
-Removed all try/except statements from other scripts.
-Changed the source for historical electricity demand from Our World In Data to EMBER (the latter is updated yearly).

@trevorb1 On my end this is functional, i.e. you can remove the different PLEXOS files and run the workflow. The added rule will redownload the files first before doing anything else. Note that at this point I added the names of the files for which it is relevant both in the 'external_files.py' script as well as in the preprocess snakemake file. If there is a more efficient way to do this let me know.

Issue Ticket Number

#119
#189

Documentation

-Setup an 'external_files' script that pulls external data (where possible).
-Altered the preprocess snakemake file to run the 'external_files' script and generate missing external files as required for other rules.
-Tested with two files from the Harvard Dataverse.
-Added missing file to esternal_files.py
-Removed external file retrievals from all other scripts.
-Updated preprocess snakefile.
-Now use historical electricity data from EMBER for the regression underlying the demand projections.
-EMBER data only runs from 2020 but will be updated year by year going into the future whereas the world in data dataset was static. The projected demand values and associated model results are comparable with a slightly better R2 based on the EMBER data.
-Added the EMBER dataset to the 'external_files.py' script and associated snakemake rule.
Copy link
Member

@trevorb1 trevorb1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maartenbrinkerink thanks for getting this! Everything looks good on my side. The only changes I made include.

  • Moved the retrieve rule to retrieve.smk file to hold retrieval rules.
  • Removed the hardcoding of the files in the external_files.py data
  • Removed the retrieved data from the repository. As users will now just automatically download this data! :)

These are just suggestions - please feel free to revert anything you dont like!

@@ -0,0 +1,36 @@

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created a separate rule file, as as more data retrieval rules will probably need to get added soon. Just helps with code organization! :)

message:
"Downloading external files..."
params:
files = get_external_links()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed the hard-coded portion in the external_files.py script and pass in the files/urls though snakemake. If running the script directly (for debugging), you can only pass in one file + url right now.

if __name__ == "__main__":

if "snakemake" in globals():
external_files = snakemake.params.files
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When passing in arguments directly through snakemake, we can pass in data structures like dictionaries (compared to only strings without a parser via command line). This is how we get around the hard coding of the files and improve re-usability!

@trevorb1
Copy link
Member

Also added the resources/data folder to .gitignore so downloaded data isnt accidentally added back to the repo

@maartenbrinkerink maartenbrinkerink merged commit 4c58860 into master Sep 10, 2024
0 of 6 checks passed
@maartenbrinkerink
Copy link
Collaborator Author

Thanks @trevorb1, the changes work for me!

@maartenbrinkerink maartenbrinkerink deleted the external-data-script branch September 10, 2024 15:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants